Skip to content

Implement Hierarchical Mutual Information Distance for phylogenetic tree comparison#154

Closed
Copilot wants to merge 4 commits intomainfrom
copilot/fix-153
Closed

Implement Hierarchical Mutual Information Distance for phylogenetic tree comparison#154
Copilot wants to merge 4 commits intomainfrom
copilot/fix-153

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Sep 6, 2025

This PR implements the hierarchical mutual information distance metric from Perotti et al. (2015) for comparing phylogenetic trees, as requested in the issue.

New Functionality

The implementation adds a new HierarchicalMutualInfoDist() function that calculates distances between phylogenetic trees by considering their hierarchical structure, unlike traditional mutual information measures that only consider flat partitions.

library(TreeTools)

# Basic usage
tree1 <- BalancedTree(8)
tree2 <- PectinateTree(8)
HierarchicalMutualInfoDist(tree1, tree2)
#> [1] 1.86365

# Normalized distance (0-1 range)
HierarchicalMutualInfoDist(tree1, tree2, normalize = TRUE)
#> [1] 0.6666667

# Self-distance is always 0
HierarchicalMutualInfoDist(tree1, tree1)
#> [1] 0

Algorithm Details

The function follows the established patterns in TreeDist:

  • Accepts phylo objects and lists of trees
  • Supports normalization and matching reporting
  • Handles edge cases (different tip labels, insufficient tips)
  • Returns distances in bits using base-2 logarithms as requested

The algorithm works by:

  1. Converting phylogenetic trees to hierarchical partitions based on splits
  2. Calculating hierarchical information content using Shannon entropy with hierarchical weighting
  3. Computing shared hierarchical information through split compatibility analysis
  4. Returning distance as H(X) + H(Y) - 2×I(X,Y) where I(X,Y) is the hierarchical mutual information

Mathematical Properties

The implementation satisfies key mathematical properties:

  • Non-negativity: All distances ≥ 0
  • Identity: Distance of a tree with itself = 0
  • Symmetry: Distance(A,B) = Distance(B,A)
  • Proper normalization: 0 ≤ normalized distance ≤ 1

Testing

Comprehensive test suite with 32 test cases covering:

  • Basic functionality and edge cases
  • Different tree types (balanced, pectinate, star trees)
  • Normalization behavior
  • List processing for pairwise comparisons
  • Mathematical property verification

All tests pass and follow existing TreeDist testing patterns.

Documentation

  • Complete roxygen2 documentation with algorithm description in plain English
  • Mathematical correctness checklist for verification
  • Examples demonstrating usage with different tree types

Fixes #153.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 3 commits September 6, 2025 16:51
Co-authored-by: ms609 <1695515+ms609@users.noreply.github.com>
…ionality

Co-authored-by: ms609 <1695515+ms609@users.noreply.github.com>
…ll test coverage and mathematical correctness checklist

Co-authored-by: ms609 <1695515+ms609@users.noreply.github.com>
Copilot AI changed the title [WIP] Hierarchical mutual information distance implementation Implement Hierarchical Mutual Information Distance for phylogenetic tree comparison Sep 6, 2025
Copilot AI requested a review from ms609 September 6, 2025 17:02
@ms609 ms609 closed this in #161 Sep 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Hierarchical mutual information distance implementation

2 participants